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ABSTRACT 

We examine the clustering properties of H i-selected galaxies through an anal- 
ysis of the H I Parkes All-Sky Survey Catalogue (Hicat) two-point correlation 
function. Various sub-samples are extracted from this catalogue to study the 
overall clustering of H i-rich galaxies and its dependence on luminosity, H I gas 
mass and rotational velocity. These samples cover the entire southern sky 5 < 0°, 
containing up to 4,174 galaxies over the radial velocity range 300 — 12, 700 kms -1 . 
A scale length of r = 3.45 ± 0.25/i _1 Mpc and slope of 7 = 1.47 ± 0.08 is ob- 
tained for the H i-rich galaxy real-space correlation function, making gas-rich 
galaxies among the most weakly clustered objects known. H i-selected galaxies 
also exhibit weaker clustering than optically selected galaxies of comparable lu- 
minosities. Good agreement is found between our results and those of synthetic 
H i-rich galaxy catalogues generated from the Millennium Run CDM simulation. 
Bisecting Hicat using different parameter cuts, clustering is found to depend 
most strongly on rotational velocity and luminosity, while the dependency on 
Hi mass is marginal. Splitting the sample around t> rot = 108 kms -1 , a scale 
length of tq = 2.86 ± 0.46 h~ l Mpc is found for galaxies with low rotational ve- 
locities compared to ro = 3.96 ± 0.33 h^ 1 Mpc for the high rotational velocity 
sample. 

Subject headings: cosmology: observations, cosmology: large-scale structure of 
the universe, cosmology: cosmological parameters, galaxies: statistics, galaxies: 
halos, radio lines: galaxies 
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1. Introduction 

The statistical analysis of galaxy clustering provides key information on the cosmological 
parameters of the universe and the formation and evolution of galaxies. A simple way of 
parametrizing galaxy clustering is though the two-point correlation function in its various 
redshift-space, projected and real-space forms (Groth & Peebles 1977; Davis & Huchra 1982). 
With the advent of large-scale optical spectroscopic surveys such as the 2dF Galaxy Redshift 
Survey (2dFGRS, Colless et al. 2001) and the Sloan Digital Sky Survey (SDSS, York et al. 
2000), the properties of the galaxy distribution are now able to be studied on cosmologically 
representative scales. The large sample sizes have also enabled clustering properties to 
be examined as a detailed function of parameters such as optical luminosity morphology, 
star formation activity and color. From these studies it has been found that the strongest 
clustering is exhibited by galaxies with high luminosities (Norberg et al. 2001, 2002), red 
spectral energy distributions (Norberg et al. 2002; Zehavi et al. 2002, 2005), and relatively 
passive star formation (Madgwick et al. 2003). 

In this work, we focus on galaxies that are selected not on their stellar content, but on 
the amount of cold gas they contain. Optical selection concentrates on the current stellar 
properties of galaxies, whereas H I selection identifies galaxies on their potential to form stars. 
These galaxies represent a population of more slowly evolving galaxies, still having a large 
fuel reservoir available for conversion into stars. The H I Parkes All-Sky Survey (Hipass) 
Catalogue (Hicat; Meyer et al. 2004) provides such a sample over the entire southern sky. 
As a blind H I survey, Hipass is also not biased by extinction, providing a unique view of 
regions such as the Zone of Avoidance that are difficult to observe in the optical. 

This paper provides a detailed analysis of the Hicat two-point correlation function, 
building on earlier work by Meyer (2004) and Ryan- Weber (2006) which also examine the 
clustering properites of Hipass galaxies. Here, we present a more detailed analysis, with a 
strong emphasis on the clustering dependencies on various galaxy parameters. 

The basic properties of Hicat are described in Section 2, with a discussion of the two- 
point correlation function technique following in Section 3. Sections 4 and 5 present the main 
results of this work, first covering the redshift-space, projected and real-space correlation 
functions, followed by an examination of the dependency of galaxy clustering on luminosity, 
H I mass, and rotational velocity. These results are discussed in Section 7 with a summary 
given in Section 8. A Hubble constant of H = 100 km s -1 Mpc -1 is used throughout to 
compare results with existing published work. 
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2. Data 

The galaxy data are taken from HlCAT, the largest blind catalogue of H I sources com- 
piled to date. Hicat accurately determines the position and redshift of the galaxies simul- 
taneously, one of the unique benefits of an H I survey. We provide a brief description of 
this dataset here, referring the reader to the relevant catalogue and data papers for a full 
discussion (Barnes et al. 2001; Meyer et al. 2004; Zwaan et al. 2004). 

Hicat contains 4,315 sources over the southern sky 8 < +2° and spanning the velocity 
range 300 to 12,700 kms -1 . The catalogue was compiled using a combination of automatic 
and manual procedures from HlPASS data. Observations for this survey were carried out 
from 1997 to 2000 with the Parkes 64 metre radio telescope. The dataset has a final spatial 
resolution of 15.5 arcmin and velocity resolution of 18 kms -1 following smoothing. Average 
noise for the data is 13 mJy beam -1 , with data at low galactic latitudes having slightly 
elevated noise levels (Zwaan et al. 2004). The completeness and reliability of the sample was 
measured using a combination of fake sources added to the data and follow-up observations 
respectively, and is described in detail in Zwaan et al. (2004). From this, 99 per cent of 
inserted sources are retrieved for sources with peak flux > 84 mJy or an integrated flux > 
9.4 Jy kms -1 . Similarly, 99 per cent of catalogue sources are found to be real for peak fluxes 
> 58 mJy or integrated flux > 8.2 Jy kms -1 . Overall reliability for the entire catalogue is 
found to be 95 per cent. To give a feeling of survey depth, Hicat has a complete sampling 
(integrated flux limited) of L* galaxies to a distance of ~ 40 h' 1 Mpc, although such galaxies 
are present in the catalogue to ~ 80 /i -1 Mpc. 



3. The Two-Point Correlation Function 

The two-point correlation function, £, provides a simple measure of galaxy clustering. 
This is computed by comparing the number of galaxy pairs at different on-sky and radial 
separations (<r and n respectively) between the real data sample and those of a randomly 
generated dataset. The random dataset is constructed to have the same selection function 
and boundaries as the real dataset, but with an unclustered Poissonian distribution of galax- 
ies. The HlPASS selection function is derived using a stepwise maximum likelihood technique 
(Zwaan et al. 2005). In particular, £(cr, 7r) is defined to give the excess probability of finding 
a galaxy pair on the given scale (a, n) compared to the random dataset. A value £(a, n) = 1 
thus corresponds to the real dataset having twice the probability of containing a galaxy 
pair on the specified scale compared to the random catalogue. In this work, the relations 
a = [(vi+Vj)/H ] tan(6>/2) and 7r = |(i>j — Vj)/H \ are used to calculate the transverse and ra- 
dial separations respectively (following Davis & Peebles 1983). Velocities for the real dataset 
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are in the heliocentric frame of reference, which provide a first order compromise between 
the Local Group and Cosmic Microwave Background standard of rest frames spanned by the 
Hicat sample (Meyer et al. 2006). 



Galaxy pair number counts of the two samples can be compared using a variety of 
techniques, or estimators. Historically, three main estimators have been used: Davis & 
Huchra (1982), Hamilton (1993), Landy & Szalay (1993). In this work, we use the last of 
these. The Landy & Szalay estimator minimizes the effects of errors in the measurement 
of sample mean galaxy density, as well as problems of using the observed sample itself to 
measure the mean density. This estimator is given by: 



where DD(a,7i) is the number of pairs at separation o and 7r in the real data sample, and 
DR(a, 7r) is the number of pairs when matching the real data sample with the random 
catalogue, and RR(cr, n) is the number of pairs in the random catalogue at the specified 
separations. The values no and n R are the galaxy number densities in the data and random 
samples respectively. These are needed to normalize pair counts between the two catalogues, 
as the random data sample is generated to contain many more points than the data sample 
to reduce statistical variation. 

For non- volume limited samples, such as the one used here, an additional problem that 
arises is the effect of the sample selection function. As noted in Ratcliffe et al. (1998), if 
no pair weighting scheme is used, pair counts are dominated by galaxies at the peak of the 
survey selection function, effectively reducing the survey volume. On the other hand, if an 
inverse selection function weighting is used, this causes pairs at high velocities to dominate 
where galaxy counts are low. A compromise is one that minimizes the variance in the 
estimate of £, as discussed in Davis & Huchra (1982) and Hamilton (1993). In this case, 
rather than weighting each pair count equally (DD, DR and RR summed using Wij = 1), the 
weighting for a given pair with redshift-space separation s = y/a 2 + % 2 can be calculated 



by multiplying individual galaxy weights (wij = WiWj) which are calculated from (Efstathiou 
1988; Hawkins et al. 2003): 



3.1. 



Estimators 





(1) 
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Wi = w(r h s) 



1 



(2) 



1 + 4irn D <j)(ri)J 3 (s) 



where <f>(ri) is the survey selection function at the distance (—Vi/H ) of the galaxy under 
consideration. For close galaxies, where the selection function is large, this weighting has 
the property oc l/0(rj) as desired, whereas at large distances when the selection function 
is small Wi ~ 1. Js(s) is defined by 



This requires the redshift-space correlation function £(s), which is one of the quantities we 
are trying to determine. However, for the calculation of weights it is sufficient to assume a 
power-law form £(s) = (s/so)~ 7 for the calculation of weights, where the values so = 5.0 and 
7 = 1.8 are used. Furthermore, £(s) is set to zero for values s > 30 Mpc (see e.g. Fisher 
et al. 1994). As noted by a number of authors, results are not sensitive to the exact form of 
J 3 (s) (Hawkins et al. 2003; Ratcliffe et al. 1998; Zehavi et al. 2002). To avoid excess noise 
in the measured correlation functions through the over-weighting of a few distant galaxies, 
it is found to be necessary to limit the Hicat sample to v < 6000 kms -1 for the weighted 
samples. This reduces the sample size to 3820 in the weighted analysis. The galaxy number 
density, no, is calculated according to (Davis & Huchra 1982; Willmer 1997): 



where the sum is taken over all galaxies 4>{ r i) > 0.001 and the volume integral equivalently 
(limiting errors caused by sparse sampling). This expression for Ud is circular in definition 
(through w), but no converges rapidly if the expressions are evaluated iteratively. 

The normalizing ratio ur/ud can be calculated when using this weighting scheme using 
(Davis & Huchra 1982; Fisher et al. 1994): 




(3) 



n D = 



J2w(ri, s = 30Mpc) 
f dV(f)(r)w(r) 



(4) 



n D _ E 



,i=N D 
■i=\ 



Wi(ri, s = 30Mpc) 
Wj(rj, s = 30Mpc) ' 



(5) 



■3=1 



where N D and N R are the number of objects in the real and random samples respectively. 
Results from both the weighted and unweighted schemes are presented in this work. Only 
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galaxy pairs with separations < 50° are included in the work presented here (Davis & Peebles 
1983). 

Like early optical redshift surveys, HlPASS spans a relatively small volume and has 
significant structure on scales comparable to that of the survey region. Particularly notable 
are two large-scale structure features at the peak of the Hicat radial velocity distribution. 
This raises the possibility that the specific location of these structure may influence our 
clustering results, i.e. the survey region may not be truly representative of the homogenous 
universe on larger scales. However, the use of a weighting scheme as described aims to 
maximize the volume contributing to the measured clustering, while still utilizing the more 
significant number counts at closer distances. Our analysis of synthetic catalogues also 
indicates that this effect should not significantly bias our results (see Section 6). 

A final effect that may influence Hicat results is source confusion due to the relatively 
large HlPASS beam (15.5 arcmin). At the applied weighted sample distance limit of 60 
/i -1 Mpc, the beam size corresponds to ~ 0.3/i _1 Mpc (cf. 0.27 /i -1 Mpc for the smallest 
separation bins examined here), and less than half that for galaxies at the peak of the 
HlPASS redshift distribution. 



3.2. Random Samples 

The random samples in this study are generated to match the underlying radial velocity 
distribution of the real datasets. This is done using 'kernel density estimation,' which finds 
the optimal Gaussian kernel width that should be used to smooth the data (Wand & Jones 
1995). On-sky positions are random. The normalised radial velocity distributions of the 
full catalogue and its random sample are shown in Figure 1. As a check, this method was 
compared with that of overlaying the Hicat completeness function onto a volume limited 
random sample of galaxies with H I masses, peak fluxes and rotational velocities derived 
from the H I mass function. While giving consistent correlation function results with those 
from the kernel density estimation method, this method was not used due to its model 
dependence. 
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4. HICAT Galaxy Clustering 

4.1. Two-Dimensional Redshift-Space Correlation Function 

Figure 2 plots the two-dimensional redshift-space correlation function diagrams for the 
sample, showing both the weighted and unweighted versions. These diagrams have been 
created by mirroring the original calculated function into each of the other three quadrants. 
Contours are fitted at logarithmic intervals to a smoothed version of the correlation function 
images. 

There are two distortions caused by the peculiar velocities of galaxies that are commonly 
observed in these diagrams for optically-selected galaxy samples. On small angular (<r) scales, 
the correlation function contours are stretched from their real-space circular shape outward 
in the radial (n) direction. This is the non-linear 'Finger-of-God' effect, caused by the line- 
of-sight velocity dispersion of galaxies in gravitationally bound structures such as galaxy 
groups and clusters. The second redshift-space distortion observed is the linear large-scale 
flattening of the correlation function contours in the n direction, caused by the coherent 
infall of galaxies into large-scale over- densities (Kaiser 1987). 

Both of these effects can be seen in the Hicat sample. The Finger-of-God effect is 
best seen in the unweighted samples, which are dominated by nearby galaxy pairs, and the 
large-scale infall is most apparent in the weighted correlation functions where the catalogue 
effective volume is larger. The line-of-sight velocity dispersion effects are likely to be due 
to H i-rich galaxies in less dense gravitationally bound concentrations such a galaxy groups. 
Objects in larger concentrations, such a galaxy clusters, will contribute less to the observed 
dispersion effect than for optically-selected samples given the relative paucity of H i-rich 
galaxies in cluster environments (Waugh et al. 2002). The large-scale coherent infall of Hl- 
rich galaxies is only sampled on relatively small scales given the shallow nature of Hicat. 

4.2. Redshift-Space Correlation Function 

To compare the correlation results of Hicat to those of optically-selected samples in 
a single dimension, the first cut on the two-dimensional redshift-space correlation function 
diagram that can be examined is the redshift-space correlation function, £(s). This is con- 
structed by taking the radial average of the two-point correlation function £(cx, n), defining 
s = V a 2 + ir 2 as before. 

Errors are measured using jackknife re-sampling (see Lupton 1993), dividing the sample 
under consideration into 24 RA bins. The redshift-space correlation function is then re- 
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measured 24 times, each time leaving out one bin of RA. The error in a given redshift bin s 
is given by (N = 24): 



Jackknife errors take into account random errors and to some degree those due to cosmic 
variance, although measurement of these latter errors is limited by the small Hicat effective 
volume. We do not determine the full covariance matrix for the binned correlation function 
data (the bins themselves are not independent) due to the small sample size. However, 
Section 6 provides further analysis of our error estimates through an examination of simulated 
Hi galaxy catalogues. Systematic errors are not taken into account. 

Weighted and unweighted results are given in Figure 3, with 2dFGRS results also in- 
cluded. It is clear that the Hicat galaxies are more weakly clustered than the 2dFGRS 
galaxies on all scales < 30 /i~ 1 Mpc. In the following sections we explore this offset in more 
detail, using the projected two-point correlation function to obtain the real space correlation 
function for H i-selected galaxies. 



A difficulty of the redshift-space correlation function is that it is still affected by the 
peculiar velocities of galaxies, which may be different for H I- and optically-selected samples. 
To compare the true spatial clustering of these galaxies, it is necessary to examine the 
clustering properties free of redshift-space distortions. One way this can be done is through 
the projected correlation function, measured by integrating the two-dimensional redshift- 
space correlation function diagram along the n axis: 



The upper limit D\ im is chosen here at the point where the integral converges. In this work 
the limit Dn m = 25h~ 1 Mpc is used, with the integrals broadly reaching a plateau at this 
point in the weighted samples (see Figure 4). It should be noted that for the unweighted 
samples, the integrals do not completely converge over the range of separations scales probed 
here. This point is discussed further in Section 4.4.1. 




(6) 



i=i 



4.3. Projected Correlation Function 




(7) 
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Errors are calculated for the projected correlation functions using jackknife re-sampling 
as before. The resultant correlation functions are shown in Figure 5. It can be seen that the 
projected correlation function is much more power-law in shape compared to the redshift- 
space correlation function. 



4.4. Real-Space Correlation Function 



Two methods are used to obtain the non-projected real-space correlation function. First, 
a power-law form is assumed for the real-space correlation function, and second the projected 
correlation function is inverted to retrieve the real-space correlation function without this 
assumption. 



4-4-1- Correlation Function Assuming Power-Law 

Following Davis & Peebles (1983) and Norberg et al. (2001), if the integral up to D hm in 
Equation 7 includes nearly all correlated pairs, the projected correlation function is related 
to the real-space correlation function £(r) by: 



SWT 



2 f°° . rdr 



Assuming the real-space correlation function has a power-law form £(r) = \J^J , the above 
integral can be evaluated in terms of gamma functions giving: 



To calculate ro and 7, the general power-law form S(<r) = a\o a2 is fitted to the projected 
correlation function using the Levenberg-Marquardt nonlinear least-squares method (Press 
et al. 1992). Resultant fits are shown in Figure 6. Only points a < 10/i _1 Mpc are used in 
these fits to restrict the data to the power-law part of the plotted correlation functions. The 
parameters r and 7 are then given by: 
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A(l-a 2 ) 
= l — a 2 



i 

1— 02 



(10) 

(11) 



Errors on ro and 7 are calculated taking the square root of the diagonal elements of the 
measured covariance matrix as the errors for a\ and a 2 then propagating appropriately. The 
off-diagonal term is also included given r and 7 are not independent in the fitting process: 



°V = 



dr \ 2 2 / <9ro x 2 
daj aai \da 2 



<9r \ / dr Q 



da 1 y \ uUj 2 

c?7 
<9a 2 



<7„ 



9a, ' aia2 



1/2 



(12) 
(13) 



Final parameter results for the weighted and unweighted samples are given in Table 1. As 
discussed earlier, there is still some dependence on the 7r axis integration limit for the pro- 
jected correlation function in the case of the unweighted sample, and changing this limit from 
25 /i _1 Mpc to 35 /i -1 Mpc alters the measured clustering scale length from 2.70 ±0.21 /i -1 Mpc 
to 3.05 ± 0.23 /i _1 Mpc (cf. 3.45 ± 0.25 h^Mpc to 3.56 ± 0.23 h^Mpc for the weighted sam- 
ple). Also included for comparison in Table 1 are the 2dFGRS results from Norberg et al. 
(2002) examining clustering as a function of luminosity and spectral type, and the recent 
SDSS results of Zehavi et al. (2005) also investigating clustering as a function of luminosity. 
The quoted Norberg et al. (2002) results correspond to the strongest and weakest clustered 
magnitude ranges for both early and late-type galaxies. The SDSS faint and bright results 
are those at the extreme ends of the measured luminosity distribution. 

From the correlation function parameter values, it can be seen that the Hicat scale 
lengths are all weaker than the 2dFGRS results, although within errors of faint late-type 
galaxies. Hicat galaxies have similar r values to those of SDSS galaxies with optical lu- 
minosities —19 < M R < —18 (r = 3.51 ± 0.32; Zehavi et al. 2005), though less luminous 
SDSS galaxies have even lower values of r (see Table 1). In all cases, the Hicat correlation 
function exhibits a flatter slope than optically selected samples, reducing the comparative 
clustering strength of Hicat galaxies on small scales. 
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To further compare Hicat results with those of optically-selected samples, we examine 
the luminosity distribution of Hicat galaxies using the Hicat optical counterpart catalogue 
(Hopcat, Doyle et al. 2005). Selecting those galaxies with good optical matches and pho- 
tometry, the resulting distribution for galaxies in the weighted Hicat full sample is shown 
in the left-hand panel of Figure 7 (74 per cent of galaxies in the sample). The right-hand 
panel of Figure 7 plots the weighted full-catalogue Hicat clustering result against the lumi- 
nosity dependent clustering scale lengths of the 2dFGRS (Norberg et al. 2002). From this it 
can be seen that Hicat galaxies span a large range of optical luminosities, and not just the 
lowest luminosity range most consistent with the Hicat clustering scale length. Combined 
with the flatter slope of the Hicat correlation function, this indicates that H i-rich galaxies 
exhibit weaker cluster clustering than 2dFGRS and SDSS galaxies of comparable optical 
luminosities, with this effect most pronounced on < ~ 1 Mpc scales. 

We calculate the correlation function for a volume limited sub-sample of Hicat as a 
test of the robustness of our technique. Such a sub-sample avoids the need for any galaxy 
pair weighting scheme as the selection function is constant. We did not use this sub-sample 
more generally as it restricts the sample size and hence accuracy with which the correlation 
function parameters can be determined. However, a volume limited sample nevertheless 
provides an interesting check on the full catalogue results. Applying the parameter cuts 
M H i > 10 9 - 05 h~ 2 M & and D < 30 /i~ 1 Mpc, we retrieve a correlation function with power-law 
parameters r = 3.2 ± 1.4 and 7 = 1.5 ± 1.1, in excellent agreement with those of the full 
sample. 



The previous section assumes a power-law shape for the correlation function. However, 
the real-space correlation function can alternatively be derived using the methods of Saun- 
ders et al. (1992) and Hawkins et al. (2003), inverting the projected correlation function 
numerically to obtain the real space correlation function without making this assumption. 
This provides an independent test of the real-space correlation function shape. Rearranging 
Equation 8 gives: 



4-4-%- Correlation Function Without Power-Law Assumption 




(14) 



12 



Assuming S(<r) to have a step function form with values Sj at logarithmic intervals with 
centres at <7j, the above integral can be evaluated (r = <jj): 



The sum is truncated in the above expression (and hence the recovered correlation function) 
at 30 h" 1 Mpc. Although exhibiting a significant amount of noise, the results of this inversion 
are in excellent agreement with the correlation function determined in the previous section 
assuming a power-law form (Figure 8, points correspond to the inversion method and dotted 
line is the result assuming a power-law). 



From previous studies (Norberg et al. 2001, 2002; Zehavi et al. 2005) it has been found 
that galaxy clustering varies as a function of luminosity, with the most luminous galaxies 
showing the strongest clustering. A great strength of using an H i-selected catalogue is that 
it uniquely provides the additional ability to study clustering as a function of gas content 
(via H I mass) and halo mass (via rotational velocity). We use optical data from Hopcat to 
compare these dependencies with the observed luminosity trend. Results are summarised in 
Table 2 with a more detailed description for each of the parameters given below. 



The dependence of galaxy clustering is examined by dividing the sample in two around 
a luminosity of Bj = -19.5. As before, magnitudes are calculated using measurements from 
Hopcat. Galaxies not matched or having data of insufficient quality are given random 
magnitudes generated from the observed luminosity distribution (26 per cent of galaxies in 
the weighted sample). Random magnitudes are used rather than those from an estimation 
method, such as deriving luminosities from the observed H I masses, as this could make it 
difficult to disentangle the different clustering dependencies. These galaxies will dilute the 
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5. Clustering Dependencies 



5.1. Luminosity 
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observed clustering dependence, but are maintained in the calculation to ensure the on- 
sky consistency between the real and random catalogue. The radial velocity distribution 
histograms for the sub-samples and their generate random catalogues are shown in Figure 9. 
The left-hand panel of Figure 10 shows the fitted projected correlation function yielding 
clustering parameters of ro = 2.90 ± 0.33 with 7 = 1.51 ± 0.14 for the low luminosity 
sample, and tq = 3.89 ± 0.30 with 7 = 1.52 ± 0.10 for those with higher luminosities. 
The right-hand panel of Figure 10 plots the calculated clustering scale lengths against the 
2dFGRS results. Grey shaded areas correspond to the first and third luminosity quartiles 
for each sample. From this it can be seen that although overall more weakly clustered, the 
Hicat galaxies exhibit a luminosity clustering dependence consistent with that observed 
for optically selected galaxies (Norberg et al. 2001, 2002; Zehavi et al. 2005). Correlation 
function slopes for the two sub-samples are nearly identical. 



5.2. Hi Mass 

We now divide the sample in two around an H I mass of 10 9 25 h~ 2 M Q . This corresponds 
to a mass ~ Figure 11 plots the radial velocity distributions for these two samples 

and their corresponding random catalogues. The projected real-space correlation functions 
with power-law fits are shown in the left-hand panel of Figure 12. The right-hand panel of 
Figure 12 compares the H I mass dependent Hicat scale lengths with the 2dFGRS luminos- 
ity dependent values of Norberg et al. (2002). From the weighted results, clustering of the 
low H I mass galaxies (r = 3.26 ± 0.23, 7 = 1.56 ±0.11) is only marginally lower than that 
of high mass galaxies (r = 3.65 ± 0.30, 7 = 1.51 ± 0.10). At the mass limits examined, 
H I mass does not therefore provide a robust method for selecting the most strongly clustered 
objects, as can be done with stellar luminosity. This is consistent with the relative depletion 
of H I mass relative to stellar luminosity in more strongly clustered environments. 



5.3. Rotational Velocity 

An alternative parameter available in Hicat to test for clustering dependence is ro- 
tational velocity. This is interesting as rotational velocity is the observable quantity most 
directly linked to the total halo mass. As such, the dependence of galaxy clustering on halo 
mass can be tested and compared to simulations, without having to relate halo properties 
to alternative observables such as optical luminosity which involve more complicated and 
poorly understood physics. Hicat offers a unique ability to assess this dependence through 
the availability of 21cm linewidths for all galaxies in the sample. These are converted to 



rotational velocities by applying a simple correction for inclination (w 50 is the width at 50 
per cent of the 21cm profile peak flux): 



w 5 o 



(16) 



2 sin(i) 



Inclinations % are determined from the observed Hopcat axial ratio where a good optical 
counterpart idenification is available: 



Here, (b/a) is the semi-minor to semi-major axis ratio, and (b/a) eos is axial ratio for an 
edge-on spiral galaxy, and is set to 0.1. Galaxies without good optical identifications are 
given random inclinations (26 per cent of the sample). 

The threshold rotational velocity used to divide the sample is 108 kms -1 , which we 
roughly estimate corresponds to a sub-halo mass of ~ 10 11 M Q (Bullock et al. 2001). The 
radial velocity distributions of each sub-sample and their corresponding random samples 
are shown in Figure 13. The left-hand panel of Figure 14 plots the projected real-space 
correlation functions, with the fits yielding real-space power-law parameters of r = 2.86 ± 
0.46 with 7 = 1.45 ± 0.14 for the low rotational velocity sample, and r = 3.96 ± 0.33 
with 7 = 1.49 ± 0.10 for galaxies with high rotational velocities. As both stellar mass 
and rotational velocity are correlated with dark matter halo mass, it is not unexpected that 
spatial clustering increases with both stellar mass and rotational velocity in our sample. The 
comparison of these results to the 2dF luminosity dependent values is given in the right-hand 
panel of Figure 14. 



Errors in the measured values of tq and 7 are all computed using the jackknife method. 
One concern is that many of the large scale structures seen in Hipass are larger than the 
total Hipass survey volume. Therefore, different jackknife sub-samples are not independent, 
which might result in an underestimation of the uncertainties. In order to test this effect, we 
make use of the Millennium Run (Springel et al. 2005) to construct independent synthetic 
Hipass volumes. 




(17) 



6. Comparison with CDM simulations 
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Croton et al. (2006) use a semi-analytical prescription to assign cold gas masses to 
individual dark matter halos identified in the Millennium Run. This prescription includes 
detailed modelling of cooling, star formation, supernova feedback, galaxy mergers and metal 
enrichment. The cold gas masses include both Hi, molecular hydrogen (H 2 ) and He. We 
take the ratio of H I mass to total cold gas mass to be 0.5, which is roughly derived by 
assuming molecular gas masses to be ~ 50 per cent of the H I gas mass (note that considerable 
variation is observed in the M(H 2 )/M(H i) ratio, Young & Knezek 1989), and a 25 per cent 
mass fraction of helium. Within the total 500 3 h~ 3 Mpc 3 box we identify 16 independent 
volumes of 120 3 h~ 3 Mpc 3 . From each of these boxes we select synthetic Hicat samples by 
placing an 'observer' on the edge of the box and then using the selection function described 
in Zwaan et al. (2004) to select galaxies. We choose to only select galaxies with H I masses 
larger than Mhi = 1O 8 ' 7 /i _2 M , roughly corresponding to the mass limit of the Millennium 
Run. We also constructed two further sets of synthetic samples corresponding to the high 
Hi mass (Mhi > 10 9 ' 25 h~ 2 M Q ) and high luminosity (Bj < —19.5) sub-samples. Synthetic 
catalogues are not made for the other sub-samples as the corresponding low H I mass and 
faint galaxy samples are not well represented in the Millennium Run data, and v rot is not 
directly available in the Croton et al. (2006) database. 

As a first test we use the bivariate stepwise maximum likelihood method from Zwaan 
et al. (2005) to construct H I mass functions from the full synthetic samples. We find that 
the H I mass functions are in excellent agreement with the real HlPASS H I mass function, 
providing confidence in the generated samples. Croton et al. 2006 also find that their semi- 
analytical results can accurately reproduce the field optical galaxy luminosity function. 

The projected correlation function is calculated for each sample and negligible 0.001 
errors ascribed to each datapoint for the power-law fitting. A separation limit of a < 
10 h^Mpc is again applied for the fitting, as was done for the real dataset. From this 
we find the mean and standard deviation of the derived real-space power-law correlation 
function parameters to be r$ = 3.49 ± 0.43 and 7 = 1.35 ± 0.12 for the full dataset. These 
are in good agreement with the results from Hicat. For the high luminosity samples we find 
that r = 3.69 ± 0.41 and 7 = 1.39 ± 0.09, and for the high H I mass samples we find that 
r = 3.83 ± 0.35 and 7 = 1.40 ± 0.07. The larger errors on the correlation function length r 
(by 72 % in the case of the full sample, 37 % for the high luminosity sample, and 17 % for the 
high H I mass sample) indicate that we may slightly underestimate the Hicat correlation 
function errors from jackknife as a result of the relatively small HlPASS volume, while the 
errors on 7 are generally consistent. However, even taking these larger error values for r , 
our principal conclusions remain unchanged. 
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7. Discussion 

Our results indicate that H I-rich galaxies are among the most weakly clustered objects 
known. That the clustering of H i-rich objects is weak is also in first order agreement with 
the weaker clustering of galaxies with active star formation (Madgwick et al. 2002), faint 
(Zehavi et al. 2005) and late-type (Norberg et al. 2002) galaxies. It is also consistent with 
the weaker clustering of H II galaxies, which are usually gas-rich dwarf systems (r = 2.7 'hr l 
Mpc, 425 galaxies; Iovino et al. 1988). There are a number of effects that could contribute to 
the lower observed clustering of H i-rich galaxies compared to the optically selected galaxy 
population. 

One important factor is the effect of environment on H I gas content. It has already been 
well established that there are few H i-rich galaxies near the cores of rich clusters (e.g. Waugh 
et al. 2002). Possible processes that can remove H I from galaxies from galaxies in the densest 
environments include: the stripping of H I gas by tidal effects in galaxy concentrations (e.g., 
galaxy harassment, Moore et al. 1996, or by the overall concentration potential), ram pressure 
stripping (Gunn & Gott 1972) or strangulation (Balogh et al. 2000). An increased rate of 
tidal interactions may also trigger increased star- formation (Barton et al. 2000), which in 
turn depletes galaxy H I gas content. It is also worth noting that galaxy properties have 
been observed to vary at substantial distances from the centers of clusters (Lewis et al. 2002; 
Gomez et al. 2003; Balogh et al. 2004; Zwaan et al. 2005). 

If environmental effects leading to a depletion of H I gas are responsible, it might be 
expected that there should exist a strongly clustered counterpart population corresponding 
to those galaxies which have had their H I gas removed. While no direct evolutionary links 
can be drawn, there do exist galaxy populations which could meet this critera, such as the 
L < L*/3 red galaxies identified in Hogg et al. (2003). Interestingly, Norberg et al. (2002) 
also identified stronger clustering for low luminosity early-type galaxies, although the result 
was not viewed as significant. These faint red galaxies represent a strongly clustered low 
mass galaxy population with little or no star formation, and that preferentially reside in the 
very massive dark matter halos of clusters. 

Another possibility for the lower observed clustering of H i-rich galaxies is that they 
form in different, intrinsically less clustered, dark matter halos compared to galaxies selected 
in the optical. As noted by Norberg et al. (2002), the luminosity dependence of clustering 
is consistent with the results of CDM simulations: the brightest galaxies form in the most 
clustered and massive dark matter halos (e.g., Benson et al. 2001). Similarly, the dependence 
of clustering on morphology may also reflect a relation between the morphology of a galaxy 
and the parameters of its halo. In this vein, the lower clustering of H i-rich galaxies may 
therefore be the result of H i-rich galaxies only forming in the low-medium peaks of the initial 
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density field that are yet to have been accreted onto the most massive and strongly clustered 
halos. This possibility has been suggested for the lower clustering observed for low surface 
brightness galaxies (Mo et al. 1994). Recent results by Gao et al. (2005) also indicate that 
dark halo clustering is a strong function of halo age, with the youngest halos being the most 
weakly clustered. Moreover, this dependence is found to increase with decreasing mass. As 
such, if H i-rich galaxies preferentially form in low mass halos, any tendency toward younger 
halos would act to further decrease the strength of H I-rich galaxy clustering. 

Our observations that H i-rich galaxies are particularly weakly clustered, and that the 
clustering strength of galaxies depends on rotational velocity (and by implication halo mass) 
are consistent with the biasing of H i-rich galaxies toward low mass halos. Both environmen- 
tal factors and initial conditions may contribute to this result. Also, the similar clustering 
dependence of H i-rich galaxies on stellar mass and rotational velocity compared to the 
weaker dependence on H I mass argues for stellar mass being a better tracer of halo mass 
than H I gas mass. 

8. Conclusions 

Existing studies of galaxy clustering find strong dependencies on a number of param- 
eters, highlighting an underlying trend for clustering to be strongest for more luminous 
galaxies (Norberg et al. 2001, 2002), earlier morphological types (e.g. Loveday et al. 1995), 
and galaxies with old stellar populations (Norberg et al. 2002; Zehavi et al. 2002; Madgwick 
et al. 2003). 

The low clustering measured for H i-rich galaxies is consistent with these trends, H I- 
rich galaxies having preferentially spiral morphologies, active star formation and blue colors. 
However, the scale length of r = 3.45 ±0.25 h~ l Mpc and slope 7 = 1.47±0.08 place H i-rich 
galaxies among the most weakly clustered objects known and at the extreme weak end of the 
observed clustering distribution. Compared to results from the 2dFGRS, Hi-rich galaxies 
are also more weakly clustered than optically selected galaxies of similar luminosities. 

Dividing the Hicat sample by Hi mass around a threshold of Mm = 10 9 ' 25 h~ 2 M Q , 
only a very marginal dependence of galaxy clustering strength on H I mass is observed. The 
scale length for the low Hi mass sample is found to be r = 3.26 ± 0.23/i _1 Mpc and for 
the high mass sample r = 3.65 ± 0.30 h~ 1 Mpc. Alternatively dividing the sample on the 
basis of rotational velocity, a stronger dependence is seen. The clustering scale length for 
galaxies v rot < 108 kms" 1 is r = 2.86 ± 0.46 hr l Mpc compared to r = 3.96 ± 0.33 hr l 
Mpc for the high rotational velocity sample. This is similar to the luminosity trend, where 
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r = 2.90 ± 0.33 for galaxies Bj > -19.5 and r = 3.89 ± 0.30 for Bj < -19.5. 

Our results are consistent with galaxy clustering being fundamentally a function of 
halo mass, which is well traced by stellar luminosity but poorly traced by HI gas mass. 
In this scenario, H i-rich galaxies preferentially occupy lower mass halos compared to the 
general galaxy population, accounting for their low clustering strength. Both environmental 
processes and initial conditions may lead to this effect. 

The Hipass and Hicat teams are acknowledged their role in planning and executing 
the programs which created the datasets from which this work is derived. We also thank 
Peder Norberg for providing us with the 2dF correlation function results for comparison and 
Darren Croton for his helpful advice on the Millennium Run data. The Millennium Run 
simulation used in this paper was carried out by the Virgo Supercomputing Consortium at 
the Computing Centre of the Max-Planck Society in Garching. The semi-analytic galaxy 
catalogue is publicly available at http://www.mpa-garching.mpg.de/galform/agnpaper. 
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1 This work, 2 Norberg et al. (2002), 3 Zehavi et al. (2005) 
Table 1: Measured values of r and 7 for Hicat, 2dFGRS and SDSS galaxies 
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Selection Criteria 
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(fe _1 Mpc) 




Galaxies 


Low H 1 Mass 


M ffl < 1O 9 - 25 /i~ 2 M 


2.63 ±0.23 


1.59 ±0.12 


2094 


Low H 1 Mass (weighted) 


M m < 10 9 - 25 h~ 2 M Q 


3.26 ±0.23 


1.56±0.11 


2093 


High H 1 Mass 


M m > 1O 9 - 25 /i~ 2 M 


3.23 ±0.27 


1.45 ±0.09 


2082 


High H 1 Mass (weighted) 


M H i > 1O 9 - 25 /i~ 2 M 


3.65 ±0.30 


1.51±0.10 


1727 


Low Luminosity 


Bj > -19.5 


2.51 ±0.23 


1.61 ±0.13 


2097 


Low Luminosity (weighted) 


Bj > -19.5 


2.90 ±0.33 


1.51 ±0.14 


2024 


High Luminosity 


Bj < -19.5 


3.21 ±0.21 


1.53 ±0.09 
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High Luminosity (weighted) 
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1.52 ±0.10 
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Low v TOt 


v Iot < 108 kms -1 


2.50 ±0.24 


1.61 ±0.13 
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Low v TOt (weighted) 


v rot < 108 kms -1 


2.86 ±0.46 


1.45 ±0.14 
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High v TOt 


Vrot > 108 kms -1 


3.11 ±0.20 


1.56±0.10 
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High v TO t (weighted) 


^rot > 108 kms -1 


3.96 ±0.33 


1.49 ±0.10 


1877 


All 




2.70 ±0.21 


1.56±0.10 


4176 


All (weighted) 




3.45 ±0.25 


1.47 ±0.08 


3820 



Table 2: Measured values of ro and 7 for high and low H I mass galaxies, high and low 
luminosity galaxies, high and low rotational velocity galaxies, and the full Hicat sample. 
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Fig. 1. — Radial velocity histogram of the Hicat galaxies (dashed line) compared with the 
generated random sample (solid line). 
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Fig. 2. — two-dimensional redshift-space correlation function diagram for Hicat galaxies: 
unweighted (left) and weighted (right). Lighter shades correspond to high values of the 
two-point correlation function. 
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Fig. 3. — Redshift-space two-point correlation function for Hicat galaxies samples compared 
with 2dFGRS results. Triangles are the unweighted Hicat results, squares are the weighted 
Hicat results and circles are those for the 2dFGRS (Hawkins et al. 2003). 
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Fig. 4. — Unweighted and weighted £ integrals (J^ £(er, 7r')dir'): (left) unweighted, (right) 
weighted. For clarity, only every alternate integral is shown. Each a bin is plotted in 
a different line style as specified by the key. The bin centre values are given in units of 
/i -1 Mpc. 
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Fig. 5. — Projected two-point correlation function for the HlCAT galaxy samples compared 
with 2dFGRS results. Triangles are the unweighted Hicat results, squares are the weighted 
Hicat results and circles are those for the 2dFGRS (Hawkins et al. 2003). 
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Fig. 6. — Projected real-space correlation functions (points) with power-law fits used 
to obtain r and 7: (left) unweighted, (right) weighted. Fitting restricted to points 
a < lO/i^Mpc. 
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Fig. 7. — (left) -B-band absolute magnitude distribution of galaxies in the weighted 
correlation function sample with HOPCAT (Doyle et al. 2005) counterparts. (right) 
Hicat correlation length plotted against 2dFGRS results as a function of luminosity (Nor- 
berg et al. 2002). The magnitude range of the grey shaded area corresponds to the first 
and third quartiles of the magnitude distribution at right and the data point corresponds to 
the median. Solid squares are the 2dFGRS points for early-type galaxies, triangles are for 
late-type galaxies and circlular points are the results for all types. All 2dFGRS points are 
plotted at the median of each luminosity bin. 
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Fig. 8. — Real-space correlation function derived using inversion method (points; Sec- 
tion 4.4.2) compared to the assumed power-law real-space correlation obtained from the pro- 
jected correlation function (dotted line; Section 4.4.1, fits to projected correlation function 
from which real-space correlation function parameters are obtained via Equations 10 and 11 
are shown in Figure 6): (left) unweighted, (right) weighted. 
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Fig. 9. — Low (top) and high (bottom) luminosity sample radial velocity distributions. Thresh- 
old luminosity is Bj = -19.5. Dashed line shows the distribution from Hicat and the solid line 
that of the random sample. 
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Fig. 10. — (left) Weighted projected real-space correlation functions with power-law fits. 
Fitting restricted to points a < 10 h~ 1 Mpc. Threshold luminosity is Bj = -19.5. (right) 
Low and high luminosity correlation lengths plotted against 2dFGRS results as a function of 
luminosity (Norberg et al. 2002). The magnitude range of the grey shaded area corresponds 
to the first and third quartiles of the HOPCAT (Doyle et al. 2005) magnitude distribution 
and the data point corresponds to the median. Solid squares are the 2dFGRS points for 
early-type galaxies, triangles are for late-type galaxies and circlular points are the results for 
all types. All 2dFGRS points are plotted at the median of each luminosity bin. 
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Fig. 11. — Low (top) and high (bottom) Hi mass sample radial velocity distributions. Thresh- 
old H I mass is 10 9 25 h~ 2 M Q . Dashed line shows the distribution from Hicat and the solid line 
that of the random sample. 
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Fig. 12. — (left) Weighted projected real-space correlation functions with power-law fits. 
Fitting restricted to points a < 10 /i _1 Mpc. Threshold mass is Mhi = 10 9 ' 25 h~ 2 M & . (right) 
Low and high H I mass correlation lengths plotted against 2dFGRS results as a function of 
luminosity (Norberg et al. 2002). The magnitude range of the grey shaded area corresponds 
to the first and third quartiles of the Hopcat (Doyle et al. 2005) magnitude distribution 
and the data point corresponds to the median. Solid squares are the 2dFGRS points for 
early-type galaxies, triangles are for late-type galaxies and circlular points are the results for 
all types. All 2dFGRS points are plotted at the median of each luminosity bin. 




Fig. 13. — Low (top) and high (bottom) rotational velocity sample radial velocity distributions. 
Threshold rotational velocity is 108 kms" 1 . Dashed line shows the distribution from Hicat and 
the solid line that of the random sample. 
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Fig. 14. — (left) Weighted projected real-space correlation functions with power-law fits. 
Fitting restricted to points a < 10/i _1 Mpc. Threshold rotational velocity is 108 kms -1 . 
(right) Low and high rotational velocity correlation lengths plotted against 2dFGRS results 
as a function of luminosity (Norberg et al. 2002). The magnitude range of the grey shaded 
area corresponds to the first and third quartiles of the HOPCAT (Doyle et al. 2005) magnitude 
distribution and the data point corresponds to the median. Solid squares are the 2dFGRS 
points for early-type galaxies, triangles are for late-type galaxies and circular points are the 
results for all types. All 2dFGRS points are plotted at the median of each luminosity bin. 



